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Abstract 

Many natural processes occur over characteristic spatial and 
temporal scales. This paper presents tools for (i) flexibly and 
scalably coarse-graining cellular automata and (ii) identify- 
ing which coarse-grainings express an automaton's dynamics 
well, and which express its dynamics badly. We apply the 
tools to investigate a range of examples in Conway's Game 
of Life and Hopfield networks and demonstrate that they cap- 
ture some basic intuitions about emergent processes. Finally, 
we formalize the notion that a process is emergent if it is bet- 
ter expressed at a coarser granularity. 



Introduction 

Biological systems are studied across a range of spa- 
tiotemporal scales - for example as collections of atoms, 
molecules, cells, and organisms C Anderson] |1972| l. How- 
ever, not all scales express a system's dynamics equally 
well. This paper proposes a principled method for identify- 
ing which spatiotemporal scale best expresses a cellular au- 
tomaton's dynamics. We focus on Conway's Game of Life 
and Hopfield networks as test cases where collective behav- 
ior arises from simple local rules. 

Conway's Game of Life is a well-studied artificial sys- 



tem with interesting behavior at multiple scales (Berlekamp 



et al. 1982 ). It is a 2-dimensional grid whose cells are up- 



dated according to deterministic rules. Remarkably, a suffi- 
ciently large grid can implement any deterministic compu- 
tation. Designing patterns that perform sophisticated com- 
putations requires working with distributed structures such 
as gliders and glider guns rather than individual cells (Den-' 
nett 1991 1. This suggests grid computations may be better 



expressed at coarser spatiotemporal scales. 

The first contribution of this paper is a coarse-graining 
procedure for expressing a cellular automaton's dynamics 
at different scales. We begin by considering cellular au- 
tomata as collections of spacetime coordinates termed occa- 
sions (cell rii at time t). Coarse-graining groups occasions 
into structures called units. For example a unit could be a 
3x3 patch of grid containing a glider at time t. Units do 
not have to be adjacent to one another; they interact through 



channel - transparent occasions whose outputs are marginal- 
ized over. Finally, some occasions are set as ground, which 
fixes the initial condition of the coarse-grained system. 

Gliders propagate at 1/4 diagonal squares per tic - the 
grid's "speed of light". Units more than 4n cells apart cannot 
interact within n tics, imposing constraints on which coarse- 
grainings can express glider dynamics. It is also intuitively 
clear that units should group occasions concentrated in space 
and time rather than scattered occasions that have nothing to 
do with each other. In fact, it turns out that most coarse- 
grainings express a cellular automaton's dynamics badly. 

The second contribution of this paper is a method for dis- 
tinguishing good coarse-grainings from bad based on the 
following principle: 

• Coarse-grainings that generate more information, rela- 
tive to their sub-grainings, better express an automaton 's 
dynamics than those generating less. 

We introduce two measures to quantify the information gen- 
erated by coarse-grained systems. Effective information, ei, 
quantifies how selectively a system's output depends on its 
input. Effective information is high if few inputs cause the 
output, and low if many do. Excess information, ^, mea- 
sures the difference between the information generated by a 
system and its subsystems. 

With these tools in hand we investigate coarse-grainings 
of Game of Life grids and Hopfield networks and show that 
grainings with high ei and ^ capture our basic intuitions 
regarding emergent processes. For example, excess infor- 
mation distinguishes boring (redundant) from interesting 
(synergistic) information-processing, exemplified by blank 
patches of grid and gliders respectively. 

Finally, the penultimate section converts our experience 
with examples in the Game of Life and Hopfield networks 
into a provisional formalization of the principle above. 
Roughly, we define a process as emergent if it is better ex- 
pressed at a coarser scale. 

The principle states that emergent processes are more 
than the sum of their parts - in agreement with many other 



approaches to quantifying emergence ( |Crutchfield| [T994[ 



ITononil [20041 |Polaiii[ [20061 IShalizi and Moorel [20061 ISeflil 



20101. Two points distinguishing our approach from prior 
work are worth emphasizing. First, coarse-graining is scal- 
able: coarse-graining a cellular automaton yields another 
cellular automaton. Prior works identify macro-variables 



such as temperature (Shahzi and Moore 20061 or centre 



of-mass ( Seth 2010 1 but do not show how to describe a sys- 
tem's dynamics purely in terms of these macro-variables. By 
contrast, an emergent coarse-graining is itself a cellular au- 
tomaton, whose dynamics are computed via the mechanisms 
of its units and their connectivity (see below). 

Second, our starting point is selectivity rather than pre- 
dictability. Assessing predictability necessitates building a 
model and deciding what to predict. Although emergent 
variables may be robust against model changes (Sethj|2010|, 
it is unsatisfying for emergence to depend on properties of 
both the process and the model. By contrast, effective and 
excess information depend only on the process: the mecha- 
nisms, their connectivity, and their output. A process is then 
emergent if its internal dependencies are best expressed at 
coarse granularities. 

Probabilistic cellular automata 

Concrete examples. This paper considers two main ex- 
amples of cellular automata: Conway's Game of Life and 
Hopfield networks (! Hopfield|[T982] l. 

The Game of Life is a grid of deterministic binary cells. A 
cell outputs 1 at time t iff: (i) three of its neighbors outputted 
Is at t — 1 or (ii) it and two neighbors outputted Is at t — 1. 



In a Hopfield network (Amit 1989 1, cell Uk fires with 
probability proportional to 



l|n,^t_i) cx exp 



(1) 



Temperature T controls network stochasticity. Attractors 
{^^, . . . ,£,^} are embedded into a network by setting the 
connectivity matrix as ajk = ~ ^ !)■ 

Abstract definition. A cellular automaton is a finite di- 
rected graph X with vertices Vx = {vi ■ ■ ■ Vertices 
are referred to as occasions; they correspond to spacetime 
coordinates in concrete examples. Each occasion vi e Vx 
is equipped with finite output alphabet Ai and Markov ma- 
trix (or mechanism) pi{ai\si), where si Cz Si ^ 
the combined alphabet of the occasions targeting vi. The 
mechanism specifies the probability that occasion vi chooses 
output ai given input s/. The input alphabet of the entire au- 
tomaton X is the product of the alphabets of its occasions 
Xin := UieVx The output alphabet is Xout = Xin- 
Remark. The input Xin and output Xout alphabets are dis- 
tinct copies of the same set. Inputs are causal interven- 
tions imposed via Pearl's do{—) calculus (Pearl 2000). The 
probability of output ai is computed via the Markov matrix: 



pi (ai\do{si)) . The do{—) is not included in the notation ex- 
plicitly to save space. However, it is always implicit when 
applying any Markov matrix. 

A Hopfield network over time interval [a, /3] is an abstract 
automaton. Occasions are spacetime coordinates - e.g. vi = 
rii f, cell i at time t. An edge connects Vk -> vi if there is 
a connection from w^'s cell to wj's and the time coordinates 
are t — 1 and t respectively for some t. The mechanism is 
given by Eq. Q. Occasions at t = a, with no incoming 
edges, can be set as fixed initial conditions or noise sources. 
Similar considerations apply to the Game of Life. 

Non-Markovian automata (whose outputs depend on in- 
puts over multiple time steps) have edges connecting occa- 
sions separated by more than one time step. 

Coarse- graining 

Define a subsystem X of cellular automaton F as a subgraph 
containing a subset of F's vertices and a subset of the edges 
targeting those vertices. We show how to coarse-grain X. 
Definition (coarse-graining). Let X be a subsystem of Y. 
The coarse- graining algorithm detailed below takes X (ZY 
and data K, as arguments, and produces new cellular au- 
tomaton Xic- Data K, consists of(i) a partition of X's occa- 
sions Vx = GUCUUiU---U \J N into ground G, channel 
C and units Ui . . .XJjx and (ii) ground output s*^. 

Vertices of automaton X^^, the new coarse-grained occa- 
sions, are units: Vxic '■= {Ui . . . XJn}- The directed graph 
of Xic is computed in Step 4 and the alphabets A; of units 
U( are computed in Step 5. Computing the Markov matrices 
(mechanisms) of the units takes all five steps. 

The ground specifies occasions whose outputs are fixed: 
the initial condition s*^. The channel specifies unobserved 
occasions: interactions between units propagate across the 
channel. Units are macroscopic occasions whose interac- 
tions are expressed by the coarse-grained automaton. Fig.[T| 
illustrates coarse-graining a simple automaton. 

There are no restrictions on partitions. For example, al- 
though the ground is intended to provide the system's ini- 
tial condition, it can contain any spacetime coordinates so 
that in pathological cases it may obstruct interactions be- 
tween units. Distinguishing good coarse-grainings from bad 
is postponed to later sections. 

Algoritlini. Apply the following steps to coarse-grain: 
Step 1. Marginalize over extrinsic inputs. 

External inputs are treated as independent noise sources; 
we are only interested in internal information-processing. 
An occasion's input alphabet decomposes into a product 
Si — Sf^ X S^^'^ of inputs from within and without the 
system. For each occasion vi e Vx, marginalize over exter- 
nal outputs using the uniform distribution: 
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Figure 1 : (A) An automaton of 6 cells connected to their imme- 
diate neighbors. (B): The directed graph of occasions over time 
interval [—6,0]. Green occasions are ground. Red and blue oc- 
casions form two units. Other occasions are channel. (C): Edges 
whose signals do not reach the blue unit have no effect. (D): The 
coarse-grained system consists of two units (macro-occasions). 



Step 2. Fix the ground. 

Ground outputs are fixed in the coarse-grained system. 
Graining JC imposes a second decomposition onto uj's in- 
put alphabet, ^ S'p x S'p x 5";" where U = UfeUfe. 
Subsume the ground into U('s mechanism by specifying 



Step 3. Marginalize over the channel. 

The channel specifies transparent occasions. Perturba- 
tions introduced into units propagate through the channel 
until they reach other units where they are observed. Trans- 
parency is imposed by marginalizing over the channel occa- 
sions in the product mechanism 



K I 

out I 



n ppK, 
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where superscripts denote that inputs and outputs are re- 
stricted, for /C, to occasions in units in JC (since channel is 
summed over and ground is already fixed) and, for each I, to 
the inputs and outputs of occasion u/. 

For example, consider cellular automaton with graph 
Va ^ Vb Vc and product mechanism p{c\b)p{b\a)p{a) . 
Setting Vb as channel and marginalizing yields coarse- 
grained mechanism J2bPi'^\^)Pi^\^)Pi^) — -P(c|a)p(a)- 
The channel is rendered transparent and new mechanism 
p{c\a) convolves p(c| 6) and p{b\a). 

Step 4. Compute the effective graph of coarse-graining X]q. 

The micro-alphabet of unit U( is A; := rifceu, ^fe- "^^^ 
mechanism of U; is computed as in Eq. (|3j with the prod- 
uct restricted to occasions j e C U U;, thus obtaining 
PJJi{ai\xin) where a; € A;. 

Two units Ufe and U; are connected by an edge if the 
outputs of Ufc make a difference to the behavior of Uj . More 



precisely, we draw an edge if 3ak , a'j. € A^ such that 

PiJiiai\x^,ak) ^ PiJi{ai\x~,a'^.) for some a; G A;. 

Here, Xj^ denotes the input from all units other than U^.. 

The effective graph need not be acyclic. Intervening via 
the do{—) calculus allows us to work with cycles. 

Step 5. Compute macro-alphabets of units in Xjc. 

Coarse-graining can eliminate low-level details. Outputs 
that are distinguishable at the base level may not be after 
coarse-graining. This can occur in two ways. Outputs b and 
b' have indistinguishable effects if p{a\b, c) = p{a\b' , c) for 
all a and c. Alternatively, two outputs react indistinguish- 
ably if p{b\c) — p{b'\c) for all c. 

More precisely, two outputs ui and u[ of unit \Ji are 
equivalent, denoted ui u\, iff 

PK{xout\x^,ui) ^PK{xout\x^,u'i) and 

PliliM^fn) = PvMW^fn) forallxo„t,Xi„. 

Picking a single element from each equivalence class ob- 
tains the macro-alphabet A; of the unit U; . The mechanism 
of U; is pu, , Step 4, restiicted to macro-alphabets. 

Information 

This section extends prior work to quantify the information 
generated by a cellular automaton, both as a whole and rela- 
tive to its subsystems ( [Balduzzi and Tononi||2008 2009 1. 

Given subsystem m of X, let pm {xout\xin), or m for short, 
denote its mechanism or Markov matrix. The mechanism is 
computed by taking the Markov matrix of each occasion in 
X, marginalizing over extiinsic inputs (edges not in X) as 
in Eq. (|2]l, and taking the product. It is notationally conve- 
nient to write pm as though its inputs and outputs are Xout 
and Xin, even though m does not in general contain all oc- 
casions in X and therefore treats some inputs and outputs 
as extrinsic, unexplainable noise. We switch freely between 
terms "subsystem" and "submechanism" below. 

Effective information quantifies how selectively a mech- 
anism discriminates between inputs when assigning them to 
an output. Alternatively, it measures how sharp the func- 
tional dependencies leading to an output are. 

The actual repertoire Pml^ml^^out) is the set of inputs 
that cause (lead to) mechanism m choosing output Xout, 
weighted by likelihood according to Bayes' rule 

, ._ Pm{x,uAd0{x,^)) 
\ •^in out I • — / \ Pum f \-^i7i J • 

P[Xout) 

The do{—) notation and hat p remind that we first inter- 
vene to impose Xi^ and then apply Markov matrix pm- 
For deterministic mechanisms, i.e. functions / ; Xin 



Xout, the actual repertoire assigns p ~ \f-^{x — 7Y\ 
ments of the pre-image and p — Oto other elements of Xin. 
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ei = log(16)-log(4) 
= 2 bits 



ei = log(16)-log(8) 
= 1 bit 



Figure 2: Categorization and infoimation. Cells fire if they re- 
ceive two or more spikes. The 16 — 2'* possible outputs by the top 
layer are arranged in a grid. (AB): Cells n\ and 714 fire when the 
output is in the orange and blue regions respectively. Cell ni's re- 
sponse is more informative than n4's since it fires for fewer inputs. 



The shaded regions in Fig. |2] show outputs of the top layer 
that cause the bottom cell to fire. 

Effective information generated when m outputs Xout is 
Kullback-Leibler divergence (iiri[p||(7] = f^)' 



in J 



(5) 



Effective information is not a statistical measure: it depends 
on the mechanism and a particular output Xout- 

Effective information generated by deterministic function 



where 



denotes cardi- 



/ is ei{f, Xout) = log2 j7^„„,)| 
nality. In Fig. |2] ei is the logarithm of the ratio of the total 
number of squares to the number of shaded squares. 

Excess information quantifies how much more informa- 
tion a mechanism generates than the sum of its submecha- 
nisms - how synergistic the internal dependencies are. 

Given subsystem with mechanism m, partition V = 
{M^ . . . Af™} of the occasions in src{m), and output Xout, 
define excess information as follows. Let mH (AP x 

X) be the restriction of m to sources in A'P . Excess infor- 
mation over V is 

C{m,'P,Xout) ■■= ei{m, Xout) -^ei{m^ , Xout)- (6) 

j 

Excess information (sans partition) is computed over the 
information-theoretic weakest link 'p^^^P 

^{m,Xout) ■.^i{m,V^"'',Xout)- (7) 

Let Ami '■= YiieMi ^j- minimum information parti- 
tion^ -pM/p minimizes normalized excess information: 



V 



MIP 



. i{m,V,Xout) . 
are; mm -. , where 



M-p := (m - 1) • min{log2 |v4j,/j|}. 



Excess information is negative if any decomposition of 
the system generates more information than the whole. 

Fig. [3] shows how two cells taken together can generate 
the same, less, or more information than their sum taken 
individually depending on how their categorizations overlap. 
Note the figure decomposes the mechanism of the system 
over targets rather than sources and so does not depict excess 
information - which is more useful but harder to illustrate. 

Effective information and excess information can be com- 
puted for any submechanism of any coarse-graining of any 
cellular automaton. 
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Figure 3: Independent, redundant and synergistic information. 
(AB): Independent. Orthogonal categorizations, orange-l-pink and 
blue-l-pink shadings respectively, by n\ and rii. (C): Partially 
redundant. Both cells fire; categorizations overlap (pink) more 
"than expected" and eiinzn^, 11) < ei(n3, 1) + ei(n^, 1). (D): 
Synergistic. Overlap is less "than expected"; ei(n3n4,01) > 
ei(n3, 0) + ei(n4, 1). 



Application: Conway's Game of Life 

The Game of Life has interesting dynamics at a range of 
spatiotemporal scales. At the atomic level, each coordinate 
(cell i at time i) is an occasion and information processing 
is extremely local. At coarser granularities, information can 
propagate through channels, so that units generate informa- 
tion at a distance. Gliders, for example, are distributed ob- 
jects that can interact over large distances in space and time. 
Fig. |4j\, and provide an important example of an emergent 
process (|Dennett| [TWT) |Beer| [2004[) . 



We restrict to bipartitions to reduce the computational burden. 



This section shows how effective and excess information 
quantifiably distinguish coarse-gi^ainings expressing glider 




Figure 4: Detecting focal points. (A): A glider moves 1 diago- 
nal square every 4 time steps. (B): Cells in the orange and black 
outlined 3x3 squares are units at t = and t = —20 respec- 
tively, with Xout the glider shown. Cells at t = —21 are blank 
ground; other occasions are channel. Shifting the position of the 
black square produces a family of coarse-grainings. Effective in- 
formation is shown as the black square's center varies over the grid. 



dynamics well from those expressing it badly. 

Effective information detects focal points. Fig. |4|\ 
shows a glider trajectory, which passes through 1 diagonal 
step over 4 tics. Fig.|4ji investigates how glider trajectories 
are captured by coarse-grainings: if there is a glider in the 
3x3 orange square at time 0, Fig.|4^, it must have passed 
through the black square at t = —20 to get there. Are coarse- 
grainings that respect glider trajectories quantifiably better 
than those that do not? 

Fig.|4p fixes occasions in the black square at t = — 20 and 
the orange square at i = as units (18 total), the ground as 
blank grid att = —21 and everything else as channel. Vary- 
ing the spatial location of the black square over the grid, we 
obtain a family of coarse-grainings. Effective information 
for each graining in the family is shown in the figure. There 
is a clear focal point exactly where the black square inter- 
sects the spatiotemporal trajectory of the glider where ei is 
maximized (dark red). Effective information is zero for lo- 
cations that are too far or too close at t = —20 to effect the 
output of the orange square at t = 0. 

Effective information thus provides a tool analogous to 
a camera focus: grainings closer to the focal point express 
glider dynamics better. 

Macroscopic texture varies with distance. The behavior 
of individual cells within a glider trajectory is far more com- 
plicated than the glider itself, which transitions through 4 
phases as it traverses its diagonal trajectory. Fig. |4]^. Does 
coarse-graining quantifiably simplify dynamics? 
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Figure 5: Macro-alphabets as a function of distance. (A): Con- 
sider two families of coarse-grainings with channel and ground as 
in Fig.|4] First, take the blue squares (filled and empty) as units at 
times —An and where n is the diagonal distance between them. 
Second, repeat for the red squares. (B): Log-plot of the size of the 
filled squares' macro-alphabets as a function of — 4n. 



Fig. |5] constructs pairs of 3 x 3 units out of occasions 
at various distances from one another and computes their 
macro-alphabets. A 3 x 3 unit has a micro-alphabet of 
2^ = 512 outputs. The macro-alphabet is found by group- 
ing micro-outputs together into equivalences classes if their 
effect is the same after propagating through the channel. We 
find that the size of the macro-alphabet decreases exponen- 
tially as the distance between units increases, stabilizing at 
5 macro-outputs: the 4 glider phases in Fig.|4]A and a large 
equivalence class of outputs that do not propagate to the tar- 
get unit and are equivalent to a blank patch of grid. A similar 
phenomenon occurs for pairs of 4 x 4 units, also Fig.|5] 

Continuing the camera analogy: at close range the texture 
of units is visible. As the distance increases, the channel 
absorbs more of the detail. The computational texture of the 
system is simpler at coarser-grains yielding a more symbolic 
description where glider dynamics are described via 4 basic 
phases produced by a single macroscopic unit rather than 2^ 
outputs produced by 9 microscopic occasions. 

Excess information detects spatial organization. So far 

we have only considered grainings of the Game of Life that 
respect its spatial organization - in effect, taking the spatial 
structure for granted. A priori, there is nothing stopping us 
from grouping the 8 gray cells in Fig. [6j\ into a single unit 
that does not respect the spatial organization, since its con- 
stituents are separated in space. Are coarse-grainings that 
respect the grid-structure quantifiably better than others? 

Fig.|6|\ shows a coarse-graining that does not respect the 
grid. It constructs two units, one from both gray squares at 
t ~ 1 and the other from both red squares at i = 0. Intu- 
itively, the coarse-graining is unsatisfactory since it builds 
units whose constituent occasions have nothing to do with 
each other over the time-scale in question. Quantitatively, 
excess information over the obvious partition V of the sys- 
tem into two parts is bits. It is easy to show ^ < for 
any disjoint units. By comparison, the coarse-grainings in 
panels CD, which respect the grid structure, both generate 
positive excess information. 
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Figure 6; Detecting spatial organization. Units are the cells in 
the red (thick-edged) and gray (filled) squares at t = and t — 1 
respectively; other occasions are extrinsic noise. (A): ^ = 0. The 
coarse-graining groups non-interacting occasions into units. (B): 
^ < 0. A blank grid is highly redundant. (CD): ^ > 0. Gliders 
perform interesting information-processing. 



Thus we find that not only does our information-theoretic 
camera have an automatic focus, it also detects when pro- 
cesses hang together to form a single coherent scene. 

Excess information detects gliders. Blank stretches of 
grid, Fig. |6j3, are boring. There is nothing going on. Are 
interesting patches of grid quantifiably distinguishable from 
boring patches? 

Excess information distinguishes blank grids from glid- 
ers: ^ on the blank grid is negative. Fig. |6|3 , since the in- 
formation generated by the cells is redundant analogous to 
Fig. [3p. By contrast, ^ for a glider is positive. Fig. |6pD, 
since its cells perform synergistic categorizations, similarly 
to Fig. [3p. Glider trajectories are also captured by excess 
information: varying the location of the red units (at t — 0) 
around the gray units we find that ^ is maximized in the po- 
sitions shown, Fig.|6pD, thus expressing the rightwards and 
downwards motions of the respective gliders. 

Returning to the camera analogy, blank patches of grid 
fade into (back)ground or are (transparent) channel, whereas 
gliders are highlighted front and center as units. 

Application: Hopfield networks 

Hopfield networks embed energy landscapes into their con- 
nectivity. For any initial condition they tend to one of few 
troughs in the landscape ( [Hopfield] 1 1982 Amit 



attractors ■ 



[T989|). Although cells in Hopfield networks are quite differ- 
ent from neurons, there is evidence suggesting neuronal pop- 
ulations transition between coherent distributed states simi- 
lar to attractors ( [Abeles et all |1 995 >, Jones et"aLl|2007| l. 
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Table 1 : Analysis of unidirectionally coupled Hopfield networks 
A ^ B each containing 8 cells. The networks and coupling 
embed attractors {00001111, 00 110011, 01010101} and their mir- 
rors. Temperature is T = 0.25. A sample run is analyzed using 
two coarse-grainings: INT captures B's effect on itself and EXT 
captures A's effect on B; see text. 

Attractors are population level phenomena. They arise 
because of interactions between groups of cells - no sin- 
gle cell is responsible for their existence - suggesting that 
coarse-graining may reveal interesting features of attractor 
dynamics. 

Effective information detects causal interactions. Ta- 

ble[T]analyzes a sample run of unidirectionally coupled Hop- 
field networks A ^ B. Network A is initialized at an un- 
stable point in the energy landscape and B in an attractor 
A settles into a different attractor from B and then shoves 
B into the new attractor over a few time steps. Intuitively, 
A only exerts a strong force on B once it has settled in an 
attractor and before B transitions to the same attractor. Is 
the force A exerts on B quantitatively detectable? 

Table[T]shows the effects of A and B respectively on B by 
computing ei for two coarse-grainings constructed for each 
transition t ^ t + 1. Coarse-graining INT sets cells in i? at t 
and < + 1 as units and A as extrinsic noise. EXT sets cells in 
Aatt and B att + las units and fixes B at time t as ground. 

INT generates higher ei for all transitions except 1 
2^3, precisely when A shoves B. Effective information 
is high when an output is sensitive to changes in an input 
so it is unsurprising that B is more sensitive to changes in A 
exactly when A forces B out from one attractor into another 
Analyzing other sample runs (not shown) confirms that ei 
reliably detects when A shoves B out of an attractor 

Macroscopic mechanisms depend on the ground. Fix- 
ing the ground incorporates population-level biases into a 
coarse-grained cellular automaton's information-processing. 

The ground in coarse-graining EXT (i.e. the output of B 
at t — 1) biases the mechanisms of the units in B at time 
t. When the ground is an attractor, it introduces tremendous 
inertia into the coarse-grained dynamics since B is heavily 
biased towards outputting the attractor again. Few inputs 
from A can overcome this inertia, so if B is pushed out of 
an attractor it generates high ei about A. Conversely, when 
B stays in an attractor, e.g. transition 5 — > 6, it follows its 
internal bias and so generates low ei about A. 



Excess information detects attractor redundancy. Fol- 
lowing our analysis of gliders, we investigate how attractors 
are captured by excess information. It turns out that ^ is neg- 
ative in all cases: the functional dependencies within Hop- 
field networks are redundant. An attractor is analogous to 
a blank Game of Life grid where little is going on. Thus, 
although attractors are population-level phenomena, we ex- 
clude them as emergent processes. 

Excess information expresses attractor transitions. We 

therefore refine our analysis and compute the subset of units 
at time t that maximize ^; maximum values are shown in 
Table [T| We find that the system decomposes into pairs of 
occasions with low ^, except when B is shoved, in which 
case larger structures of 5 occasions emerge. This fits prior 
analysis showing transitions between attractors yield more 
integrated dynamics ( )Balduzzi and Tononi| |2008| ) and sug- 
gestions that cortical dynamics is metastable, characterized 



by antagonism between local attractors (Friston 1997 1. 

Our analysis suggests that transitions between attractors 
are the most interesting emergent behaviors in coupled Hop- 
field networks. How this generalizes to more sophisticated 
models remains to be seen. 

Emergence 

The examples show we can quantify how well a graining 
expresses a cellular automaton's dynamics. Effective in- 
formation detects glider trajectories and also captures when 
one Hopfield network shoves another. However, ei does not 
detect whether a unit is integrated. For this we need ex- 
cess information, which compares the information generated 
by a mechanism to that generated by its submechanisms. 
Forming units out of disjoint collections of occasions yields 
^ = 0. Moreover, boring units (such as blank patches of grid 
or dead-end fixed point attractors) have negative ^. Thus, ^ 
is a promising candidate for quantifying emergent processes. 

This section formalizes the intuition that a system is emer- 
gent if its dynamics are better expressed at coarser spa- 
tiotemporal granularities. The idea is simple. Emergent 
units should generate more excess information, and have 
more excess information generated about them, than their 
sub-units. Moreover emergent units should generate more 
excess information than neighboring units, recall Fig. |4] 

Stating the definition precisely requires some notation. 
LetsrCi,, = {vi}U{vk\k — ?> 1} and similarly for trg^^ . Let J' 
be a subgraining of K., denoted J' ^ /C, if for every e J' 
there is a unit U^: € K, such that JJj C Ufe. We compare 
mechanism m C K, with its subgrains via 

^AC/j(Tn,a;o„t) := ei^(m, Xout) - ^ ei j{m^ , Xout), 

where = m n svCy- and ei^ signifies effective informa- 
tion is computed over /C using micro-alphabets. 



Definition (emergence). Fix cellular automaton X with out- 
put Xout- Coarse-graining^ JC is emergent if it satisfies con- 
ditions El and E2. 

El. Each unit U; £ /C generates excess information about 
its sources and has excess information generated about 
it by its targets, relative to subgrains J' ^ JC: 

< S.J/K (srcu, , Xout) and < ^j//c (irflu, , Xout) ■ 

(8) 

E2. There is an emergent subgrain J < JC such that (i) 
every unit of JC contains a unit of J and (ii) neighbors 
JC' (defined below) of JC with respect to J satisfy 

?J//C'(s"u',a;o„t) <£,j/K.{5xz\j,Xout) (9) 

for all U e /C, and similarly for trg's. 

If JC has no emergent subgrains then E2 is vacuous. 

Grain JC' is a neighbor of JC with respect Xo J < JC if for 
every U e /C there is a unique U' e JC' satisfying 

Nl. there is a unit T e J such that T C U,U', stCT C 
sccujSccu' and similarly for trg; and 

N2. the alphabet of U' is no larger than U: jllfeeu' \ < 
Iriieu^'l' similarly for the combined alphabets 
of their sources and targets respectively. 

The graining £x that best expresses X outputting Xout is 
found by maximizing normalized excess information: 

t x(x out )■= a.rg max —-^ . (10) 

{/c I emergent} AI^mip 

Here, 7V^ Mip is the normalizing constant found when com- 
puting the minimum information partition for IC. 

Some implications. We apply the definition to the Game 
of Life to gain insight into its mechanics. 

Condition El requires that interactions between units and 
their sources (and targets) are synergistic, Fig.|6tD. Units 
that decompose into independent pieces. Fig. pK, or per- 
form highly redundant operations, Fig.|6p, are therefore not 
emergent. 

Condition E2 compares units to their neighbors. Rather 
than build the automaton's spatial organization directly 
into the definition, neighbors of JC are defined as coarse- 
grainings whose units overlap with JC and whose alpha- 
bets are no bigger. Coarse-grainings with higher ^ than 
their neighbors are closer to focal points, recall Fig. [4] and 
Fig.|6pD, where ^ was maximized for units respecting glider 
trajectories. An analysis of glider boundaries similar in spirit 
to this paper is (Beerj 2004| l. 



^Ground output s*^ is Xout restricted to ground occasions. 



Finally, Eq. ( [TO] i picks out the most expressive coarse- 
graining. The normahzation plays two roles. First, it bi- 
ases the optimization towards grainings whose MIPs con- 



tain few, symmetric parts following (Balduzzi and Tononi 



2008 1. Second, it biases the optimization towards systems 



with simpler macro-alphabets. Recall, Fig. |5j that coarse- 
graining produces more symbolic interactions by decreasing 
the size of alphabets. Simplifying alphabets typically re- 
duces effective and excess information since there are less 
bits to go around. The normalization term rewards simpler 
levels of description, so long as they use the bits in play more 
synergistically. 

Discussion 

In this paper we introduced a flexible, scalable coarse- 
graining method that applies to any cellular automaton. Our 
notion of automaton applies to a broad range of systems. 
The constraints are that they (i) decompose into discrete 
components with (ii) finite alphabets where (iii) time passes 
in discrete tics. We then described how to quantify the in- 
formation generated when a system produces an output (at 
any scale) both as a whole and relative to its subsystems. 
An important feature of our approach is that the output Xout 
of a graining is incorporated into the ground and also di- 
rectly influences ei and ^ through computation of the actual 
repertoires. Coarse-graining and emergence therefore cap- 



ture some of the suppleness of biological processes (Bedau 



1997| ): they are context-dependent and require many ceteris 
paribus clauses (i.e. background) to describe. 

Investigating examples taken from Conway's Game of 
Life and coupled Hopfield networks, we accumulated a 
small but significant body of evidence confirming the prin- 
ciple that expressive coarse-grainings generate more infor- 
mation relative to sub-grainings. Finally, we provisionally 
defined emergent processes. The definition is provisional 
since it derives from analyzing a small fraction of the possi- 
ble coarse-grainings of only two kinds of cellular automata. 

Hopfield networks and the Game of Life are simple mod- 
els capturing some important aspects of biological systems. 
Ultimately, we would like to analyze emergent phenomena 
in more realistic models, in particular of the brain. Con- 
scious percepts take 100-200ms to arise and brain activity 
is (presumably) better expressed as comparatively leisurely 
interactions between neurons or neuronal assemblies rather 
than much faster interactions between atoms or molecules 
(Tononi 2004 1. To apply the techniques developed here 



to more realistic models we must confront a computational 
hurdle: the number of coarse-grainings that can be imposed 
on large cellular automata is vast. Nevertheless, the ap- 
proach developed here may still be of use. First, manip- 
ulating macro-alphabets provides a method for performing 
approximate computations on large-scale systems. Second, 
for more fine-grained analysis, initial estimates about which 
coarse-grainings best express a system's dynamics can be 



fine-tuned by comparing them with neighbors. 
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